Speaker recognition by means of acoustic and phonetically informed GMMs
نویسندگان
چکیده
In this work we assess the recently proposed hybrid Deep Neural Network/Gaussian Mixture Model (DNN/GMM) approach for speaker recognition considering the effects of the granularity of the phonetic DNN model, and of the precision of the corresponding GMM models, which will be referred to as the phonetic GMMs. The aim of this work is to better understand the contributions of the phonetic information provided by the DNN model with respect to the accuracy of the acoustic GMMs in fitting the distribution of the features associated to a given context-dependent phone state. The testbed for this work was the text-independent speaker recognition task defined by NIST for the 2012 Speaker Recognition Evaluation. Our experiment confirms that the acoustic and the phonetic GMMs are complementary. Thus, their score combination yields very good results if the DNN is trained on data collected in an environment similar to the one that is used for testing. We show, however, that using a single Gaussian per DNN state is not the best choice: the best single system has been obtained balancing the phonetic and acoustic precision of a DNN/GMM system.
منابع مشابه
Pitch-dependent GMMs for text-independent speaker recognition systems
Gaussian mixture models (GMMs) and ergodic hidden Markov models (HMMs) have been successfully applied to model short-term acoustic vectors for speaker recognition systems. Prosodic features are known to carry information concerning the speaker’s identity and they can be combined with the short-term acoustic vectors in order to increase the performance of the speaker recognition system. In this ...
متن کاملFuzzy Gaussian mixture models for speaker recognition
A fuzzy clustering based modification of Gaussian mixture models (GMMs) for speaker recognition is proposed. In this modification, fuzzy mixture weights are introduced by redefining the distances used in the fuzzy c-means (FCM) functionals. Their reestimation formulas are proved by minimising the FCM functionals. The experimental results show that the fuzzy GMMs can be used in speaker recogniti...
متن کاملDeterministic Annealing EM Algorithm in Acoustic Modeling for Speaker and Speech Recognition
This paper investigates the effectiveness of the DAEM (Deterministic Annealing EM) algorithm in acoustic modeling for speaker and speech recognition. Although the EM algorithm has been widely used to approximate the ML estimates, it has the problem of initialization dependence. To relax this problem, the DAEM algorithm has been proposed and confirmed the effectiveness in artificial small tasks....
متن کاملGMM based clustering and speaker separability in the Timit speech database
Speaker recognition on the 630 speaker Timit speech database, using maximum probability selection with a simple Gaussian Mixture Model (GMM) for the data distribution for each speaker, gives above 99% correct recognition. In contrast, a powerful classifier such as a Multi Layer Perceptron (MLP), trained to estimate speaker probabilities, even on a small subset of speakers often performs no bett...
متن کاملAcoustic language identification using fast discriminative training
Gaussian Mixture Models (GMMs) in combination with Support Vector Machine (SVM) classifiers have been shown to give excellent classification accuracy in speaker recognition. In this work we use this approach for language identification, and we compare its performance with the standard approach based on GMMs. In the GMM-SVM framework, a GMM is trained for each training or test utterance. Since i...
متن کامل